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Abstract 

Time series are difficult to monitor, summarize and predict. 
Segmentation organizes time series into few intervals having 
uniform characteristics (flatness, linearity, modality, mono- 
tonicity and so on). For scalability, we require fast linear 
time algorithms. The popular piecewise linear model can 
determine where the data goes up or down and at what rate. 
Unfortunately, when the data does not follow a linear model, 
the computation of the local slope creates overfitting. We 
propose an adaptive time series model where the polyno- 
mial degree of each interval vary (constant, linear and so on). 
Given a number of regressors, the cost of each interval is its 
polynomial degree: constant intervals cost 1 regressor, lin- 
ear intervals cost 2 regressors, and so on. Our goal is to 
minimize the Euclidean (I2) error for a given model com- 
plexity. Experimentally, we investigate the model where in- 
tervals can be either constant or linear. Over synthetic ran- 
dom walks, historical stock market prices, and electrocardio- 
grams, the adaptive model provides a more accurate segmen- 
tation than the piecewise linear model without increasing the 
cross-validation error or the running time, while providing 
a richer vocabulary to applications. Implementation issues, 
such as numerical stability and real-world performance, are 
discussed. 

1 Introduction 

Time series are ubiquitous in finance, engineering, and sci- 
ence. They are an application area of growing importance in 
database research [2]. Inexpensive sensors can record data 
points at 5 kHz or more, generating one million samples ev- 
ery three minutes. The primary purpose of time series seg- 
mentation is dimensionality reduction. It is used to find fre- 
quent patterns [24] or classify time series [29]. Segmentation 
points divide the time axis into intervals behaving approxi- 
mately according to a simple model. Recent work on seg- 
mentation used quasi-constant or quasi-linear intervals [27], 
quasi-unimodal intervals [23], step, ramp or impulse [21], or 
quasi-monotonic intervals [10, 16]. A good time series seg- 
mentation algorithm must 

• be fast (ideally run in linear time with low overhead); 



• provide the necessary vocabulary such as flat, increasing 
at rate x, decreasing at rate x, . . . ; 

• be accurate (good model fit and cross-validation). 
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Figure 1 : Segmented electrocardiograms (ECG) with adap- 
tive constant and linear intervals (top) and non-adaptive 
piecewise linear (bottom). Only 150 data points out of 600 
are shown. Both segmentations have the same model com- 
plexity, but we simultaneously reduced the fit and the leave- 
one-out cross-validation error with the adaptive model (top). 

Typically, in a time series segmentation, a single model 
is applied to all intervals. For example, all intervals are as- 
sumed to behave in a quasi-constant or quasi-linear manner. 
However, mixing different models and, in particular, con- 
stant and linear intervals can have two immediate benefits. 
Firstly, some applications need a qualitative description of 
each interval [48] indicated by change of model: is the tem- 



perature rising, dropping or is it stable? In an ECG, we need 
to identify the flat interval between each cardiac pulses. Sec- 
ondly, as we will show, it can reduce the fit error without 
increasing the cross-validation error. Intuitively, a piecewise 
model tells when the data is increasing, and at what rate, and 
vice versa. While most time series have clearly identifiable 
linear trends some of the time, this is not true over all time 
intervals. Therefore, the piecewise linear model locally over- 
fits the data by computing meaningless slopes (see Fig. 1). 

Global overfitting has been addressed by limiting the 
number of regressors [46], but this carries the implicit as- 
sumption that time series are somewhat stationary [38]. 
Some frameworks [48] qualify the intervals where the slope 
is not significant as being "constant" while others look for 
constant intervals within upward or downward intervals [10]. 

Piecewise linear segmentation is ubiquitous and was one 
of the first applications of dynamic programming [7]. We 
argue that many applications can benefit from replacing it 
with a mixed model (piecewise linear and constant). When 
identifying constant intervals a posteriori from a piecewise 
linear model, we risk misidentifying some patterns including 
"stair cases" or "steps" (see Fig. 1). A contribution of this 
paper is experimental evidence that we reduce fit without 
sacrificing the cross-validation error or running time for a 
given model complexity by using adaptive algorithms where 
some intervals have a constant model whereas others have a 
linear model. The new heuristic we propose is neither more 
difficult to implement nor more expensive computationally. 
Our experiments include white noise and random walks as 
well as ECGs and stock market prices. We also compare 
against the dynamic programming (optimal) solution which 
we show can be computed in time 0(n 2 k). 

Performance-wise, common heuristics (piecewise lin- 
ear or constant) have been reported to require quadratic 
time [27]. We want fast linear time algorithms. When the 
number of desired segments is small, top-down heuristics 
might be the only sensible option. We show that if we al- 
low the one-time linear time computation of a buffer, adap- 
tive (and non-adaptive) top-down heuristics run in linear time 
(0(n)). 

Data mining requires high scalability over very large 
data sets. Implementation issues, including numerical stabil- 
ity, must be considered. In this paper, we present algorithms 
that can process millions of data points in minutes, not hours. 

2 Related Work 

Table 1 summarizes the various common heuristics and al- 
gorithms used to solve the segmentation problem with poly- 
nomial models while minimizing the Euclidean (Z 2 ) error. 
The top-down heuristics are described in section 7. When 
the number of data points n is much larger than the num- 
ber of segments k (n ^> k), the top-down heuristics is par- 
ticularly competitive. Terzi and Tsaparas [42] achieved a 



Table 1: Complexity of various segmentation algorithms 
using polynomial models with k segments and n data points, 
including the exact solution by dynamic programming. 

Algorithm Complexity 
Dynamic Programming 0(?i 2 /c) 
Top-Down 0(nk) 
Bottom-Up O(nlogn) [39] or 0(n 2 /k) [27] 
Sliding Windows Q(n) [39] 



complexity of 0(n 4//3 fc 5//3 ) for the piecewise constant model 
by running (n/fc) 2 / 3 dynamic programming routines and us- 
ing weighted segmentations. The original dynamic program- 
ming solution proposed by Bellman [7] ran in time 0(n 3 k), 
and while it is known that a 0(n 2 fc)-time implementation is 
possible for piecewise constant segmentation [42], we will 
show in this paper that the same reduced complexity ap- 
plies for piecewise linear and mixed models segmentations 
as well. 

Except for Pednault who mixed linear and quadratic 
segments [40], we know of no other attempt to segment 
time series using polynomials of variable degrees in the 
data mining and knowledge discovery literature though there 
is related work in the spline and statistical literature [19, 
35, 37] and machine learning literature [3, 5, 8]. The 
introduction of "flat" intervals in a segmentation model has 
been addressed previously in the context of quasi-monotonic 
segmentation [10] by identifying flat subintervals within 
increasing or decreasing intervals, but without concern for 
the cross-validation error. 

While we focus on segmentation, there are many 
methods available for fitting models to continuous vari- 
ables, such as a regression, regression/decision trees, Neu- 
ral Networks [25], Wavelets [14], Adaptive Multivariate 
Splines [19], Free-Knot Splines [35], Hybrid Adaptive 
Splines [37], etc. 

3 Complexity Model 

Our complexity model is purposely simple. The model 
complexity of a segmentation is the sum of the number of 
regressors over each interval: a constant interval has a cost 
of 1, a linear interval a cost of 2 and so on. In other words, 
a linear interval is as complex as two constant intervals. 
Conceptually, regressors are real numbers whereas all other 
parameters describing the model only require a small number 
of bits. 

In our implementation, each regressor counted uses 
64 bits ("double" data type in modern C or Java). There 
are two types of hidden parameters which we discard (see 
Fig. 2): the width or location of the intervals and the number 



of regressors per interval. The number of regressors per in- 
terval is only a few bits and is not significant in all cases. The 
width of the intervals in number of data points can be repre- 
sented using k [log to] bits where to is the maximum length 
of a interval and k is the number of intervals: in the exper- 
imental cases we considered, [log to] < 8 which is small 
compared to 64, the number of bits used to store each regres- 
sor counted. We should also consider that slopes typically 
need to be stored using more accuracy (bits) than constant 
values. This last consideration is not merely theoretical since 
a 32 bits implementation of our algorithms is possible for the 
piecewise constant model whereas, in practice, we require 
64 bits for the piecewise linear model (see proposition 5.1 
and discussion that follows). Experimentally, the piecewise 
linear model can significantly outperform (by m 50%) the 
piecewise constant model in accuracy (see Fig. 11) and vice 
versa. For the rest of this paper, we take the fairness of our 
complexity model as an axiom. 




Figure 2: To describe an adaptive segmentation, you need 
the length and regressors of each interval. 

The desired total number of regressors depends on do- 
main knowledge and the application: when processing ECG 
data, whether we want to have two intervals per cardiac pulse 
or 20 intervals depends on whether we are satisfied with 
the mere identification of the general location of the pulses 
or whether we desire a finer analysis. In some instances, 
the user has no guiding principles or domain knowledge 
from which to choose the number of intervals and a model 
selection algorithm is needed. Common model selection 
approaches such as Bayesian Information Criterion (BIC), 
Minimum Description Length (MDL) and Akaike Informa- 
tion Criterion (AIC) suffer because the possible model com- 
plexity p is large in a segmentation problem (p = n) [11]. 
More conservative model selection approaches such as Risk 
Inflation Criterion [17] or Shrinkage [14] do not directly 
apply because they assume wavelet-like regressors. Cross- 
validation [18], generalized cross-validation [12], and leave- 
one-out cross-validation [45] methods are too expensive. 
However, stepwise regression analysis [9] techniques such 
as permutation tests ("pete") are far more practical [46]. In 
this paper, we assume that the model complexity is known 
either as an input from the user or through model selection. 



4 Time Series, Segmentation Error and Leave-One-Out 

Time series are sequences of data points 
(xo, 2/0)7 ■ ■ ■ j {xn-ii Vn-i) where the x values, the "time" 
values, are sorted: xi > x%-\. In this paper, both the x 
and y values are real numbers. We define a segmentation 
as a sorted set of segmentation indexes zq, . . . , z K such that 
zq = and z K = n. The segmentation points divide the time 
series into intervals S\ , . . . , S K defined by the segmentation 
indexes as Sj = {(xi, yi)\zj^i < i < zj} . Additionally, 
each interval Si , . . . , S K has a model (constant, linear, 
upward monotonic, and so on). 

In this paper, the segmentation error is computed from 
J2j=i Q(Sj) where the function Q is the square of the I2 re- 
gression error. Formally, Q(Sj) = mm p YTr=z- I (p( a; r) — 
y r ) 2 where the minimum is over the polynomials p of a given 
degree. For example, if the interval Sj is said to be constant, 
then Q(Sj) = J2 Zj <i< Zj+1 {Vl ~ V? where V is the average, 
y = J2 Z 1 <i<z z-+i-z- ' Similarly, if the interval has a lin- 
ear model, then p(x) is chosen to be the linear polynomial 
p(x) = ax + b where a and b are found by regression. The 
segmentation error can be generalized to other norms, such 
as the maximum-error (Z^) norm [10, 32] by replacing the 
operators by max operators. 

When reporting experimental error, we use the I2 error 

\jYTj=i Q(Sj)- We only compare time series having a fixed 
number of data points, but otherwise, the mean square error 

should be used: \/ E ^ =lQ(Sj) . 

V n 

If the data follows the model over each interval, 
then the error is zero. For example, given the time se- 
ries (0, 0), (1, 0), (2, 0), (3, 1), (4, 2), we get no error when 
choosing the segmentation indexes zq = 0, 21 = 2, zi = 5 
with a constant model over the index interval [0, 2) and a 
linear model over the index interval [2,5). However, the 
choice of the best segmentation is not unique: we also get 
no error by choosing the alternative segmentation indexes 
z = 0, z\ = 3, z 2 — 5. 

There are two types of segmentation problem: 

• given a bound on the model complexity, find the segmen- 
tation minimizing the segmentation error; 

• given a bound on the segmentation error, find a segmen- 
tation minimizing the model complexity. 

If we can solve efficiently and incrementally one problem 
type, then the second problem type is indirectly solved. 
Because it is intuitively easier to suggest a reasonable bound 
on the model complexity, we focus on the first problem type. 

For applications such as queries by humming [49], it 
is useful to bound the distance between two time series 
using only the segmentation data (segmentation points and 
polynomials over each interval). Let || • || be any norm in 
a Banach space, including the Euclidean distance. Given 



time series y, y', let s(y), s(y') be the piecewise polynomial 
approximations corresponding to the segmentation, then by 
the triangle inequality \\s(y)-s(y')\\ - \\s(y)-y\\ - \\s{y') - 

y'W < Wv-y'W < Hy)-s(y')\\ + \\s( y )-y\\ + \\s(y')- y '\\. 

Hence, as long as the approximation errors are small, \\s(y) — 
y\\ < e and ||s(y') — y'\\ < e, then we have that nearby 
segmentations imply nearby time series (||s(y) — s(y')\\ < 
e =>■ \\y — y'\\ < 3e) and nearby time series imply nearby 
segmentations (\\y — y'\\ < e \\ s {y) — s (y')ll < 3e). This 
result is entirely general. 

Minimizing the fit error is important. On the one 
hand, if we do not assume that the approximation errors are 
small, it is possible for the segmentation data to be identi- 
cal \\s(y) — s(y')\\ = while the distance between the time 
series, \\y — y'\\, is large, causing false positives when identi- 
fying patterns from the segmentation data. For example, the 
sequences 100, —100 and —100, 100 can be both approxi- 
mated by the same fiat model (0,0), yet they are far apart. 
On the other hand, if the fit error is large, similar time series 
can have different segmentations, thus causing false nega- 
tives. For example, the sequences —100, 100, —100.1 and 
— 100.1, 100, —100 have the piecewise flat model approxi- 
mations 0, 0, —100.1 and —100.1, 0, respectively. 

Beside the data fit error, another interesting form of 
error is obtained by cross-validation: divide your data points 
into two sets (training and test), and measure how well your 
model, as fitted over the training set, predicts the test set. We 
predict a missing data point (xi,yi) by first determining the 
interval [zj-i, Zj) corresponding to the data point (x Zji < 
Xi < x Zj ) and then we compute p(xi) where p is the 
regression polynomial over Sj. The error is \p(x{) — 
We opt for the leave-one-out cross-validation where the test 
set is a single data point and the training set is the remainder. 
We repeat the cross-validation over all possible missing data 
points, except for the first and last data point in the time 
series, and compute the mean square error. If computing the 
segmentation takes linear time, then computing the leave- 
one-out error in this manner takes quadratic time, which is 
prohibitive for long time series. 

Naturally, beyond the cross-validation and fit errors, a 
segmentation should provide the models required by the 
application. A rule-based system might require to know 
where the data does not significantly increase or decrease 
and if flat intervals have not been labelled, such queries are 
hard to support elegantly. 

5 Polynomial Fitting in Constant Time 

The naive fit error computation over a given interval takes 
linear time 0(n): solve for the polynomial p and then 
compute J2i(Ui ~ p( x i)) 2 - This has lead other authors 
to conclude that top-down segmentation algorithm such as 
Douglas-Peucker's require quadratic time [27] while we will 
show they can run in linear time. To segment a time series 



into quasi-polynomial intervals in optimal time, we must 
compute fit errors in constant time (0(1)). 

Proposition 5.1. Given a time series j/i)}t=i,...,n. if 
we allow the one-time O(n) computation of a prefix buffer, 
finding the best polynomial fitting the data over the interval 
[x p , Xq] is 0(1). This is true whether we use the Euclidean 
distance (I2) or higher order norms {l r for 00 > r > 2). 

Proof. We prove the result using the Euclidean (I2) norm, 
the proof is similar for higher order norms. 

We begin by showing that polynomial regression can be 
reduced to a matrix inversion problem. Given a polynomial 
^fSg 1 djX 1 *, the square of the Euclidean error is YH=p(Vi ~ 

X^jL't) 1 a j x i) 2 - Setting the derivative with respect to ai 
to zero for / = 0, . . . , TV — 1, generates a system of 
TV equations and TV unknowns, X^o* a J ^2i=p = 
J2i= P Ui x i where I = 0, . . . , TV — 1. On the right-hand- 
side, we have a TV dimensional vector (Vi = J2i= P Ui x i) 
whereas on the left-hand-side, we have the TV x TV Tceplitz 
matrix Au = J2i= P x i multiplied by the coefficients of 
the polynomial (ao, . . . , ojv— 1). That is, we have the matrix- 
vector equation i=0 M,iO-i = Vj. 

As long as TV > q — p, the matrix A is invertible. When 
N < q — p, the solution is given by setting N = q — p and 
letting ai = for i > q — p. Overall, when TV is bounded 
a priori by a small integer, no expensive numerical analysis 
is needed. Only computing the matrix A and the vector V is 
potentially expensive because they involve summations over 
a large number of terms. 

Once the coefficients ao, . . . , un-i are known, we com- 
pute the fit error using the formula: 

q / iV-1 \ 2 N-X 2V-1 q 

- vi) = Yl a 3 ai Y x * +l 

i—p \ j—Q J j—0 1—0 i—p 

N-l q q 

- 2 Y a iY x i yi+ Y y i- 

j—0 i—p i—p 

Again, only the summations are potentially expensive. 

Hence, computing the best polynomial fitting some data 
points over a specific range and computing the corresponding 
fit error in constant time is equivalent to computing range 
sums of the form J2i= P x \v\ m constant time for < i, I < 
2TV. To do so, simply compute once all prefix sums P|'' = 
Si=o x \y\ an d men use their subtractions to compute range 
queries £? =p x\y\ = P^ 1 - P 3 p i v 

Prefix sums speed up the computation of the range sums 
(making them constant time) at the expense of update time 
and storage: if one of the data point changes, we may have to 
recompute the entire prefix sum. More scalable algorithms 



Table 2: Accuracy of the polynomial fitting in constant time 
using 32 bits and 64 bits floating point numbers respectively. 
We give the worse percentage of error over 1000 runs using 
uniformly distributed white noise (n = 200). The domain 
ranges from x = to x = 199 and we compute the fit error 
over the interval [180, 185). 
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are possible if the time series are dynamic [31]. Computing 
the needed prefix sums is only done once in linear time and 
requires (N 2 + N + l)n units of storage (6n units when 
N = 2). For most practical purposes, we argue that we will 
soon have infinite storage so that trading storage for speed is 
a good choice. It is also possible to use less storage [33]. 

When using floating point values, the prefix sum ap- 
proach causes a loss in numerical accuracy which becomes 
significant if x or y values grow large and N > 2 (see Ta- 
ble 2). When N = 1 (constant polynomials), 32 bits floating 
point numbers are sufficient, but for N > 2, 64 bits is re- 
quired. In this paper, we are not interested in higher order 
polynomials and choosing N — 2 is sufficient. 

6 Optimal Adaptive Segmentation 

An algorithm is optimal, if it can find a segmentation with 
minimal error given a model complexity k. Since we can 
compute best fit error in constant time for arbitrary polyno- 
mials, a dynamic programming algorithm computes the op- 
timal adaptive segmentation in time 0(n 2 Nk) where N is 
the upper bound on the polynomial degrees. Unfortunately, 
if N > 2, this result does not hold in practice with 32 bits 
floating point numbers (see Table 2). 

We improve over the classical approach [7] because we 
allow the polynomial degree of each interval to vary. In the 
tradition of dynamic programming [30, pages 261-265], in a 
first stage, we compute the optimal cost matrix (R): R T}P is 
the minimal segmentation cost of the time interval [xq,x p ) 
using a model complexity of r. If E(p, q, d) is the fit error 
of a polynomial of degree d over the time interval [ 
computable in time O(l) by proposition 5.1, then 

R r .q = min Rr-i-d ,p + E(p, q, d) 

0<p<q,0<d<N 

with the convention that R r -i~d,p is infinite when r — 1 — 
d < except for R-i.o = 0. Because computing R r q 
only requires knowledge of the prior rows, R r i : . for r' < r, 
we can compute R row-by-row starting with the first row 



(see Algorithm 1). Once we have computed the r x n + 1 
matrix, we reconstruct the optimal solution with a simple 
O(k) algorithm (see Algorithm 2) using matrices D and P 
storing respectively the best segmentation points and the best 
degrees. 



Algorithm 1 First part of dynamic programming algorithm 
for optimal adaptive segmentation of time series into inter- 
vals having degree 0, . . . , N — 1. 
1: INPUT: Time Series (xi, yi) of length n 
2: INPUT: Model Complexity k and maximum degree N 

(N = 2 constant and linear) 
3: INPUT: Function E(p, q, d) computing fit error with 

poly, of degree d in range [x p , x q ) (constant time) 
4: R, D,P^kxn + l matrices (initialized at 0) 
5: for r G {0, . . .,k — l}do 
6: { r scans the rows of the matrices } 
7: for q 6 {0, ... , n} do 
8: { q scans the columns of the matrices } 
9: Find a minimum of R r _i_d. p + E(p, q, d) and store 
its value in R r . q , and the corresponding d,p tuple 
in D r q ,P rA for < d < min(r + 1,7Y) and 
< p < q + 1 with the convention that R is oo 
on negative rows except for R-i t o = 0. 
10: RETURN cost matrix R, degree matrix D, segmenta- 
tion points matrix P 



Algorithm 2 Second part of dynamic programming algo- 
rithm for optimal adaptive segmentation. 
1: INPUT: k x n + 1 matrices R, D, P from dynamic 
programming algo. 
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x <— p 


10 


RETURN optimal segmentation s 



7 Piecewise Linear or Constant Top-Down Heuristics 

Computing optimal segmentations with dynamic program- 
ming is fl(n 2 ) which is not practical when the size of the 
time series is large. Many efficient online approximate al- 
gorithms are based on greedy strategies. Unfortunately, for 
small model complexities, popular heuristics run in quadratic 
time (0(n 2 )) [27]. Nevertheless, when the desired model 
complexity is small, a particularly effective heuristic is the 
top-down segmentation which proceeds as follows: starting 



with a simple segmentation, we further segment the worst 
interval, and so on, until we exhaust the budget. Keogh 
et al. [27] state that this algorithm has been independently 
discovered in the seventies and is known by several name: 
Douglas-Peucker algorithm, Ramers algorithm, or Iterative 
End-Points Fits. In theory, Algorithm 3 computes the top- 
down segmentation, using polynomial regression of any de- 
gree, in time 0(kn) where k is the model complexity, by 
using fit error computation in constant time. In practice, our 
implementation only works reliably for d = or d = 1 us- 
ing 64 bits floating point numbers. The piecewise constant 
(d = 0) and piecewise linear (d = 1) cases are referred to 
as the "top-down constant" and "top-down linear" heuristics 
respectively. 

Algorithm 3 Top-Down Heuristic. 

INPUT: Time Series (xi, j/i) of length n 

INPUT: Polynomial degree d (d = 0, d = 1, etc.) and 

model complexity k 

INPUT: Function E(p, q) computing fit error with poly. 

in range [x p , x q ) 

S empty list 

S <- (0,n,E(0,n)) 

b^k-d 

while b — d > do 

find tuple (i, j, e) in 5* with maximum last entry 

find minimum of E(i, I) + E(l,j) for I = i + 1, . . . , j 

remove tuple (i, j, e) from S 

insert tuples (i, I, E(i, I)) and (I, j, E(l,j)) in 5* 

b^b-d 
S contains the segmentation 



8 Adaptive Top-Down Segmentation 

Our linear time adaptive segmentation heuristic is based on 
the observation that a linear interval can be replaced by two 
constant intervals without model complexity increase. After 
applying the top-down linear heuristic from the previous 
section (see Algorithm 3), we optimally subdivide each 
interval once with intervals having fewer regressors (such 
as constant) but the same total model complexity. The 
computational complexity is the same, 0((k + l)n). The 
result is Algorithm 4 as illustrated by Fig. 3. In practice, we 
first apply the top-down linear heuristic and then we seek to 
split the linear intervals into two constant intervals. 

Because the algorithm only splits an interval if the fit 
error can be reduced, it is guaranteed not to degrade the 
fit error. However, improving the fit error is not, in itself, 
desirable unless we can also ensure we do not increase the 
cross-validation error. 

An alternative strategy is to proceed from the top-down 
constant heuristic and try to merge constant intervals into 
linear intervals. We chose not to report our experiments with 
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Figure 3: Adaptive Top-Down Segmentation: initially, we 
compute a piecewise linear segmentation, then we further 
subdivide some intervals into constant intervals. 



this alternative since, over our data sets, it gives worse results 
and is slower than all other heuristics. 

9 Implementation and Testing 

Using a Linux platform, we implemented our algorithms 
in C++ using GNU GCC 3.4 and flag "-02". Intervals 
are stored in an STL list object. Source code is available 
from the author. Experiments run on a PC with an AMD 
Athlon 64 (2 GHZ) CPU and enough internal memory so 
that no disk paging is observed. 

Using ECG data and various number of data points, we 
benchmark the optimal algorithm, using dynamic program- 
ming, against the adaptive top-down heuristic: Fig. 4 demon- 
strates that the quadratic time nature of the dynamic pro- 
gramming solution is quite prevalent (t « n 2 /50000 sec- 
onds) making it unusable in all but toy cases, despite a C++ 
implementation: nearly a full year would be required to opti- 
mally segment a time series with 1 million data points! Even 
if we record only one data point every second for an hour, we 
still generate 3,600 data points which would require about 
4 minutes to segment! Computing the leave-one-out error of 
a quadratic time segmentation algorithm requires cubic time: 
to process the numerous time series we chose for this paper, 
days of processing are required. 

We observed empirically that the timings are not sensi- 
tive to the data source. The difference in execution time of 
the various heuristics is negligible (under 15%): our imple- 
mentation of the adaptive heuristic is not significantly more 
expensive than the top-down linear heuristic because its ad- 
ditional step, where constant intervals are created out of lin- 



Algorithm 4 Adaptive Top-Down Heuristic. 
INPUT: Time Series (xi, yi) of length n 
INPUT: Bound on Polynomial degree N and model com- 
plexity k 

INPUT: Function E(p,q, d) computing fit error with poly, 
in range [x p , x q ) 
S empty list 

d<- N - 1 

S <- (0,n,d,E(0,n,d)) 

b^k-d 

while b — d > do 

find tuple (i, j, d, e) in S with maximum last entry 
find minimum of E(i,l,d) + E(l,j,d) for I = i + 

remove tuple e) from S 

insert tuples (i,l,d,E(i,l,d)) and (l,j, d,E(l,j, d)) in 
S 

b^b-d 
for tuple q, e) in S do 

find minimum m of E(i, I, d') + E(l,j, q — d' — Vj for 
I = i + 1,.. . JandO < <f < q- 1 
if m < e then 

remove tuple (£, j, e) from S 
insert tuples (i,l,d',E(i,l,d')) and {l,j,q — d! — 
l,E(l,j, q — d! — 1)) in 5 
5 contains the segmentation 



ear ones, can be efficiently written as a simple sequential 
scan over the time series. To verify the scalability, we gen- 
erated random walk time series of various length with fixed 
model complexity (k = 20), see Fig. 5. 

10 Random Time Series and Segmentation 

Intuitively, adaptive algorithms over purely random data are 
wasteful. To verify this intuition, we generated 10 sequences 
of Gaussian random noise (n = 200): each data point takes 
on a value according to a normal distribution (p = 0, a = 1). 
The average leave-one-out error is presented in Table 3 (top) 
with model complexity k = 10, 20, 30. As expected, the 
adaptive heuristic shows a slightly worse cross-validation 
error. However, this is compensated by a slightly better fit 
error (5%) when compared with the top-down linear heuristic 
(Fig. 6). On this same figure, we also compare the dynamic 
programming solution which shows a 10% reduction in fit 
error for all three models (adaptive, linear, constant/flat), for 
twice the running time (Fig. 4). The relative errors are not 
sensitive to the model complexity. 

Many chaotic processes such as stock prices are some- 
times described as random walk. Unlike white noise, the 
value of a given data point depends on its neighbors. We gen- 
erated 10 sequences of Gaussian random walks (n = 200): 
starting at the value 0, each data point takes on the value 
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Figure 4: Adaptive Segmentation Timings: Time in seconds 
versus the number of data points using ECG data. We 
compare the optimal dynamic programming solution with 
the top-down heuristic (k = 20). 
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Figure 5: Heuristics timings using random walk data: Time 
in seconds versus the number of data points (k = 20). 
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Figure 6: Average Euclidean (I2 norm) fit error over syn- 
thetic white noise data. 
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Figure 7: Average Euclidean (I2 norm) fit error over syn- 
thetic random walk data. 



Table 3: Leave-one-out cross-validation error for top-down 
heuristics on 10 sequences of Gaussian white noise (top) 
and random walks (bottom) (n = 200) for various model 
complexities (fc) 



white noise 


adaptive 


linear 


constant 


linear/adaptive 


k = 10 


1.07 


1.05 


1.06 


98% 


fc = 20 


1.21 


1.17 


1.14 


97% 


fc = 30 


1.20 


1.17 


1.16 


98% 


random walks 


adaptive 


linear 


constant 


linear/adaptive 


fc = 10 


1.43 


1.51 


1.43 


106% 


fc = 20 


1.16 


1.21 


1.19 


104% 


fc = 30 


1.03 


1.06 


1.06 


103% 



V% = Vi-i + N(0, 1) where N(0, 1) is a value from a normal 
distribution (p = 0, a = 1). The results are presented in Ta- 
ble 3 (bottom) and Fig. 7. The adaptive algorithm improves 
the leave-one-out error (3%-6%) and the fit error (« 13%) 
over the top-down linear heuristic. Again, the optimal al- 
gorithms improve over the heuristics by approximately 10% 
and the model complexity does not change the relative errors. 

11 Stock Market Prices and Segmentation 

Creating, searching and identifying stock market patterns is 
sometimes done using segmentation algorithms [48]. Keogh 
and Kasetty suggest [28] that stock market data is indistin- 
guishable from random walks. If so, the good results from 
the previous section should carry over. However, the random 
walk model has been strongly rejected using variance esti- 
mators [36]. Moreover, Sornette [41] claims stock markets 
are akin to physical systems and can be predicted. 

Many financial market experts look for patterns and 
trends in the historical stock market prices, and this approach 
is called "technical analysis" or "charting" [4, 6, 15]. If 
you take into account "structural breaks," some stock mar- 
ket prices have detectable locally stationary trends [13]. De- 
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Figure 8: Average Euclidean (I2 norm) fit error over 14 dif- 
ferent stock prices, for each stock, the error was normalized 
(1 = optimal adaptive segmentation). The complexity is set 
at 20 (fc = 20) but relative results are not sensitive to the 
complexity. 



spite some controversy, technical analysis is a Data Mining 
topic [44, 20, 26]. 

We segmented daily stock prices from dozens of compa- 
nies [1]. Ignoring stock splits, we pick the first 200 trading 
days of each stock or index. The model complexity varies 
k = 10, 20, 30 so that the number of intervals can range from 
5 to 30. We compute the segmentation error using 3 top- 
down heuristics: adaptive, linear and constant (see Table 4 
for some of the result). As expected, the adaptive heuris- 
tic is more accurate than the top-down linear heuristic (the 
gains are between 4% and 11%). The leave-one-out cross- 
validation error is improved with the adaptive heuristic when 
the model complexity is small. The relative fit error is not 
sensitive to the model complexity. We observed similar re- 
sults using other stocks. These results are consistent with 
our synthetic random walks results. Using all of the histori- 
cal data available, we plot the 3 segmentations for Microsoft 
stock prices (see Fig. 9). The line is the regression polyno- 
mial over each interval and only 150 data points out of 5,029 
are shown to avoid clutter. Fig. 8 shows the average fit error 
for all 3 segmentation models: in order to average the re- 
sults, we first normalized the errors of each stock so that the 
optimal adaptive is 1 .0. These results are consistent with the 
random walk results (see Fig. 7) and they indicate that the 
adaptive model is a better choice than the piecewise linear 
model. 

12 ECGs and Segmentation 

Electrocardiograms (ECGs) are records of the electrical volt- 
age in the heart. They are one of the primary tool in screen- 
ing and diagnosis of cardiovascular diseases. The resulting 
time series are nearly periodic with several commonly iden- 
tifiable extrema per pulse including reference points P, Q, R, 
S, and T (see Fig. 10). Each one of the extrema has some 
importance: 



Table 4: Euclidean segmentation error (h norm) and cross-validation error for k = 10, 20, 30: lower is better. 
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Figure 9: Microsoft historical daily stock prices (close) segmented by the adaptive top-down (top), linear top-down (middle), 
and constant top-down (bottom) heuristics. For clarity, only 150 data points out of 5,029 are shown. 




Figure 10: A typical ECG pulse with PQRST reference 
points. 

• a missing P extrema may indicate arrhythmia (abnormal 
heart rhythms); 

• a large Q value may be a sign of scarring; 

• the somewhat fiat region between the S and T points is 
called the ST segment and its level is an indicator of 
ischemia [34]. 

ECG segmentation models, including the piecewise linear 
model [43, 47], are used for compression, monitoring or 
diagnosis. 

We use ECG samples from the MIT-BIH Arrhythmia 
Database [22]. The signals are recorded at a sampling 
rate of 360 samples per second with 11 bits resolution. 
Prior to segmentation, we choose time intervals spanning 
300 samples (nearly 1 second) centered around the QRS 
complex. We select 5 such intervals by a moving window 
in the files of 6 different patients ("100. dat", "101. dat", 
"102.dat", "103.dat", "104.dat", "105.dat"). The model 
complexity varies k = 10, 20, 30. 

The segmentation error as well as the leave-one-out er- 
ror are given in Table 5 for each patient and they are plotted 
in Fig. 1 1 in aggregated form, including the optimal errors. 
With the same model complexity, the adaptive top-down 
heuristic is better than the linear top-down heuristic (>5%), 
but more importantly, we reduce the leave-one-out cross- 
validation error as well for small model complexities. As 
the model complexity increases, the adaptive model eventu- 
ally has a slightly worse cross-validation error. Unlike for 
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Figure 1 1 : Average Euclidean Q2 norm) fit error over ECGs 
for 6 different patients. 



the random walk and stock market data, the piecewise con- 
stant model is no longer competitive in this case. The op- 
timal solution, in this case, is far more competitive with an 
improvement of approximately 30% of the heuristics, but the 
relative results are the same. 

13 Conclusion and Future Work 

We argue that if one requires a multimodel segmentation 
including flat and linear intervals, it is better to segment 
accordingly instead of post-processing a piecewise linear 
segmentation. Mixing drastically different interval models 
(monotonic and linear) and offering richer, more flexible 
segmentation models remains an important open problem. 

To ease comparisons accross different models, we pro- 
pose a simple complexity model based on counting the num- 
ber of regressors. As supporting evidence that mixed mod- 
els are competitive, we consistently improved the accuracy 
by 5% and 13% respectively without increasing the cross- 
validation error over white noise and random walk data. 
Moreover, whether we consider stock market prices of ECG 
data, for small model complexity, the adaptive top-down 
heuristic is noticeably better than the commonly used top- 
down linear heuristic. The adaptive segmentation heuristic 
is not significantly harder to implement nor slower than the 
top-down linear heuristic. 



Table 5: Comparison of top-down heuristics on ECG data 
(n = 200) for various model complexities: segmentation 
error and leave-one-out cross-validation error. 



Fit error for k = 10, 20, 30. 
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We proved that optimal adaptive time series segmenta- 
tions can be computed in quadratic time, when the model 
complexity and the polynomial degree are small. However, 
despite this low complexity, optimal segmentation by dy- 
namic programming is not an option for real-world time se- 
ries (see Fig. 4). With reason, some researchers go as far 
as not even discussing dynamic programming as an alterna- 
tive [27]. In turn, we have shown that adaptive top-down 
heuristics can be implemented in linear time after the linear 
time computation of a buffer. In our experiments, for a small 
model complexity, the top-down heuristics are competitive 
with the dynamic programming alternative which sometimes 
offer small gains (10%). 

Future work will investigate real-time processing for 
online applications such as high frequency trading [48] and 
live patient monitoring. An "amnesic" approach should be 
tested [39]. 
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